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Abstract. We present a randomized algorithm, which, given positive integers n and 
t and a real number < e < 1, computes the number |S(n,t)| ofnxn non-negative 
integer matrices (magic squares) with the row and column sums equal to t within 
relative error e. The computational complexity of the algorithm is polynomial in e~ 1 
and quasi-polynomial in = nt, that is, of the order ]V log . A simplified version of 
the algorithm works in time polynomial in e~ 1 and N and estimates |S(n,t)| within 
a factor of N l ° s N . This simplified version has been implemented. We present results 
of the implementation, state some conjectures, and discuss possible generalizations. 



1. Introduction and main results 

(1.1) Magic squares. Let us fix two positive integers n and t. An n x n magic 
square with the line sum t is an n x n non- negative integer matrix D = (dij) with 
the row and column sums t: 



t for i = 1, .... n and 



n 

£ 



d^ = t for j = 1, ... 



We note that sometimes such matrices are called semi-magic squares, but we follow 
the terminology adopted in modern combinatorics, for example, in Chapter 4 of 
[St97]. 
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Let E(n,t) be the set of all n x n magic squares with the line sum t. In this 
paper, we present a randomized approximation algorithm to compute the number 
|E(n, t)\. The algorithm runs in quasi-polynomial time. More precisely, let N = nt 
(in what follows, we reserve notation N for the sum of the entries of the matrix). 

We present a randomized algorithm, which, for any given positive integers n and 
t and positive e < 1, approximates |E(n, t)\ within relative error e. The computa- 
tional complexity of the algorithm is (l/e)°^ N°( lnN ^ (in the unit cost model). 

From this same approach, one also obtains a simpler, randomized polynomial 
time algorithm which approximates |E(n, t)\ within a factor of N°( lnN K We im- 
plemented the latter algorithm and report on the computational results in Section 



(1.2) Contingency tables. More generally, given positive integers m and n, 
a positive integer vector R = (n,... ,r m ), and a positive integer vector C = 
(ci, . . . , c n ) such that 



the m x n non- negative integer matrices with the row sums ri, . . . , r m and the 
column sums c\ , . . . ,c n are called contingency tables with the margins R and C. 
The problem of computing or estimating efficiently the cardinality |E(i2, C)\ of the 
set of contingency tables with the given margins has been of significant interest, 
see [DE85], [DG95], [D+97], [Mo02], and [CD03] due to connections to statistics, 
representation theory, and symmetric functions. 

Using the Markov Chain Monte Carlo approach, Dyer, Kannan, and Mount 
[D+97] showed how to count contingency tables if the row and column sums are 
sufficiently large, namely, if = O (n 2 m) and Cj = fi (m 2 n) for all They 
presented a randomized algorithm, which, given an e > 0, approximates the number 
|E(i2, C)\ of tables within relative error e in time polynomial in e _1 , n, m, and 
Y^i log ri + Yj log Cj (the bit size of the margins) . It turns out that for large 
margins the number of contingency tables is well-approximated by the volume of 
the transportation polytope of the m x n non-negative matrices with the row sums 
ri and the column sums Cj. The set E(i2, C) of contingency tables can be viewed 
as the set of integer points in that polytope. Subsequently, Morris [Mo02] obtained 
a similar result for the bounds = O (n 2 / 3 m In m) and Cj = O (m 3 / 2 n In n) . 

In addition, for large and Cj there is a heuristic formula for \T,(R, C)\ due to 
Diaconis and Efron [DE85]: 



1.4. 



ri + . . . + r. 



rn 



ci + . . . + c n = N, 



(1.2.1) 
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This formula first approximates C) \ by the volume of the corresponding trans- 

portation polytope and then (since no explicit formula is known for the volume) 
approximates the volume by an explicitly computable integral of a certain density. 
However, there are no proved or even conjectural conclusions on the accuracy of 
this formula. 

At the opposite extreme, when the margins ri,Cj are very small (bounded by 
a constant fixed in advance) relative to the dimensions m and n of the matrix, 
Bekessy, Bekessy, and Komlos [B+72] obtained an asymptotic formula 



(1.2.2) mR,C)\ » — — -exp 

r\\ ■ ■ -r m !ci! • • -c n ! 




This formula reflects the fact that the majority of contingency tables with small 
margins have entries 0, 1, and 2. Also, if the margins are bounded by a constant 
fixed in advance, one can compute the exact value of |£(-R, C)\ in time polynomial 
in m + n by a dynamic programming algorithm. 

Using the dynamic programming approach in a different vein, Cryan and Dyer 
[CD03] constructed a randomized polynomial time approximation algorithm for 
computing \T,(R, C) \ provided the number m of rows (or the number n of columns) 
is fixed in advance. 

In some sense, the case of magic squares m = n with moderately large margins 
t (say, of order n) lies at the core of the remaining hard cases of contingency ta- 
ble enumeration. Our algorithm is quasi-polynomial, and we conjecture that its 
straightforward modification achieves, in fact, a genuinely polynomial time com- 
plexity, and that it naturally extends to a randomized polynomial time algorithm 
to count contingency tables with any margins, see Section 10. A disadvantage of 
our approach is that there seems to be no easy way to generate a random magic 
square, unlike in the approaches of [D+97] and [Mo02]. 

(1.3) Idea of the algorithm. Our algorithm builds on the technique of rapidly 
mixing Markov chains and, in, particular on efficient sampling from log-concave 
densities, as developed in [AK91], [F+94], [FK99], [LV06], see also [Ve05] for a 
survey, the permanent approximation algorithm [J+04], the strongly polynomial 
time algorithm for matrix scaling [L+00], as well as the integral representation of 
\V(R,C)\ from [Ba05] and [Ba07]. 
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Let A = A nXn C M. n be the open (n 2 — 1) -dimensional simplex of all n x n 
positive matrices X = (xij) such that 

n 

Let // be Lebesgue measure on A normalized by the constraint p(A) = 1. Using 
results of [Ba05] , we represent the number of magic squares as an integral 

(1.3.1) \X(n,t)\ = f fdfJL 

J A 

of some continuous density / : A — > R + . Furthermore, as in [Ba05], the density / 
is factored 

(1.3.2) f=p<f>, 

where <f> : A — > R_|_ is log- concave, that is, 

(f)(aX + pY) > <p a (X)^(Y) for all X, Y e A and 

for all a, f3 > such that a + (3 = 1 

and p(X) > 1 for all X e A. Moreover, for any X e A the values p(X) and </>(X) 
are computable in time polynomial in N. More precisely, for any given e > the 
value of <fi can be computed within relative error e in time polynomial in ln(l/e) 
and iV by a deterministic algorithm of [L+00] while the value of p can be computed 
within relative error e in time polynomial in 1/e and iV by a randomized algorithm 
of [J+04]. The algorithm computing <f> seems to work very well in practice. 

The key result of this paper is that there is a threshold T = j\f KlnN for some 
absolute constant k > such that if we define p : A — >■ M + by 

_, v , < P(X) it P {X)<T 
P{X) ~ \ T if p(X) > T 

then the integral 

(1.3.3) / p<p dp 

J A 

approximates the integral 

pcf) d\x = |S(n, t)\ 

within a relative error as small as iV~ n , say. 
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The simplified version of the algorithm consists of computing the integral 



(1.3.4) f </>dfJL 

J A 

using any of the randomized polynomial time algorithms of [AK91], [F+94], [FK99], 
and [LV06] for integrating log-concave densities. Clearly, the value of (1.3.4) ap- 
proximates |E(n, t)\ within a factor of N 0( - lnN \ that is, 

/ (pdiJ<\J:(n,t)\<N KlnN [ </>dfi 

J A J A 

for some absolute constant k > 0. 

The full version of the algorithm consists of estimating (1.3.3) within relative 
error e using any of the randomized polynomial time algorithms of [AK91], [F+94], 
[FK99] , and [LV06] for sampling from log-concave densities as well as the random- 
ized polynomial time algorithm of [J+04] for approximating the permanent of a 
positive matrix. 

Namely, let v be the Borel probability measure on A with the density propor- 
tional to (p. 

Thus we can rewrite (1.3.3) as the product 

CM (//*)• 

We compute the second factor as above. The first factor is approximated by the 
sample mean 



(1.3.5) 




where x±, . . . , x m G A are independent points sampled at random from the measure 
v. Each such point can be sampled in time polynomial in N. The Chebyshev 
inequality implies that to achieve relative error e with probability 2/3 it suffices to 
sample m = O (e" 2 T 2 ) = e - 2 N 0( - lnN ^ points in (1.3.5). 

(1.4) Computational experiments. 

We implemented the simplified version of the algorithm that computes the inte- 
gral (1.3.4). Below, we have tabulated some representative examples of our results 
for various values of t and n. The software and additional data are available at 
[Yo07]. 

For n < 7, we were able to compare the obtained values ( "estimate" in the table 
below) against the exact numbers ( "answer" in the table below is the exact number 
rounded to the three main digits) computed at our request by Jesus De Loera using 
the LattE code, see [L+04]: 
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n 



t 



estimate 



answer 



5 5 1.51 x 10 7 2.21 x 10 7 
10 7.96 x 10 10 7.93 x 10 10 
125 1.55 x 10 27 1.10 x 10 27 

6 6 4.70 x 10 11 6.02 x 10 11 
12 7.12 x 10 16 2.28 x 10 17 
36 2.21 x 10 27 5.62 x 10 27 
216 3.76 x 10 46 3.07 x 10 46 

7 7 1.70 x 10 17 2.16 x 10 17 
14 1.75 x 10 25 2.46 x 10 25 
49 3.00 x 10 42 4.00 x 10 42 
343 3.90 x 10 71 1.28 x 10 72 . 

The larger values of n below seem to be presently beyond the reach of LattE 
or any other available computer code. Thus we compared the obtained values 
("estimate" in the table below) with the Diaconis-Efron heuristic formula (1.2.1) 
("heuristic" in the table below), which is believed to be valid for t 3> n: 

n t estimate heuristic 

12 8 2.72 x 10 48 4.96 x 10 49 

20 4.55 x 10 81 1.68 x 10 82 

15 20 8.3 x 10 119 2.43 x 10 121 

100 7.65 x 10 236 2.71 x 10 237 . 

An interesting feature of the data is that the Diaconis-Efron formula seems to be 
in a reasonable agreement with our computations even at the range t ~ n, where 
the heuristic arguments supporting the formula do not look plausible any longer. 
One can argue, however, that the agreement becomes more reasonable for larger t. 

Finally, we consider the case of very small t, where we compare our results 
("estimate" in the table below) with the asymptotic formula (1.2.2) ("asymptotic" 
in the table below). 

n t estimate asymptotic 

25 5 2.89 x 10 108 6.17 x 10 108 

30 5 1.49 x 10 142 3.02 x 10 142 . 

These results suggest that the integral (1.3.4) approximates |E(n, t)\ quite well, 
conjecturally within a factor of N°^\ if not just 0(1). Moreover, the algo- 
rithm appears to estimate |S(i?, C)\ for general margins (-R, C) with similar ef- 
fectiveness. For example, in a known test case [DE85], [L+04], we have n = 4, 
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R = (220,215,93,64) and C = (108,286,71,127). The correct answer is about 
1.22 x 10 15 whereas our algorithm predicts it to be 1.02 x 10 15 . 

In practice, our algorithm is slower than LattE when the latter is applicable. 
However, as is explained in Section 3.6, our algorithm has the advantage of being 
able to compute when the values of n are moderately large, far beyond those possible 
with exact methods. Typically, we have been able to compute these in the order of a 
few days on current technology. However, the algorithm is highly parallelizable, and 
also, the intermediate output used to estimate |S(n, t)\ can be used to "bootstrap" 
estimates of u)\ for u > t. Exploiting these features substantially decrease 
computing time. Finally, the memory requirements of our implementation are 
modest and have not been an issue in our experiments. 

(1.5) Organization of the paper. 

In Section 2, we describe the density / of formula (1.3.1) and the factorization 
f = pcf> of (1.3.2). 

In Section 3, we describe the algorithm of approximating the number |E(n, t)\ of 
magic squares in detail and state all the bounds for |E(n, t) |, /, 0, and p we need to 
conclude that the algorithm indeed approximates the desired number within relative 
error e in (l/e)°^ N°^ lnN ^ time. We also describe details of the implementation. 

Sections 4-9 are devoted to the proofs. 

In Section 4, we invoke certain classical results, the van der Waerden and the 
Bregman-Minc bounds for the permanent of a non-negative matrix and obtain a 
straightforward corollary that we use later. 

In Section 5, we prove that the total number ( N ^_^ 1 ) of n x n non-negative 

integer matrices with the sum of entries equal to N is at most times bigger 

than the number of n x n magic squares with the line sum t. Also, we prove that the 
maximum of the density / on the simplex A in (1.3.1) does not exceed (^^J 1 .] -1 )- 
In Sections 6-8, we prove the key estimate of the paper, namely, that for any 
a > there is a j3 = (3(a) > such that the probability that a random X e A 
satisfies p(X) > iV' 3111 ^ in (1.3.2) does not exceed N~ an . Section 7 contains 
some standard probabilistic estimates whereas Section 6 contains an estimate of 
the entries of the doubly stochastic scaling of a positive matrix, which may be of 
interest in its own right. Roughly, it states that for a sufficiently generic n x n 
matrix A, all the entries of its doubly stochastic scaling are sufficiently close to 
1/n. 

In Section 9 we state some technical estimates for the log-concave density </> 
which imply that the algorithms of [AK91], [F+94], [FK99], [LV06] for polynomial 
time integration and sampling are indeed applicable. 

Finally, in Section 10 we describe possible extensions of our approach, in partic- 
ular, to contingency tables with equal row sums, but not necessarily column sums 
and vice versa. We also conjecture that the N°( lnN ^ bound can be replaced by 
N ^ so that our approach produces a polynomial time algorithm. 
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2. The integral representation for the number of magic squares 



To obtain the representation (1.3.1), we express the number |E(n, t)\ as the 
expectation of the permanent of an N x N random matrix for N = nt. Let A = (ay) 
be an N x N square matrix. The permanent of A is given by the formula 

N 

per A = Yl a i<r(i)i 

cESn i=l 

where Sn is the symmetric group of all permutations a of the set {1, . . . , N}. 

Let Jt denote the txt matrix with the entries 1/t. For an n x n matrix X = [xij ) , 
the matrix X ® J t denotes the N x N block matrix whose (i, j)th block is the txt 
matrix with the entries Xij/t. 

We recall that a random variable £ has the standard exponential distribution if 




if t < 
if t > 0. 



The following result is a particular case of the general formula of Theorem 1.2 
of [Ba05] and Theorem 4 of [Ba07] for the number of contingency tables of a given 
type. 

(2.1) Theorem. Let X = (x^) be the n x n matrix of independent standard 
exponential random variables x^ . Then 

t N 

l E (MI = 77n2^ E Per(X® J t ) . 



In other words, 

t N f [ " ! 

l E ( n ' t )l = "^jpT J Rn2 P er ( X ® J t) exp < - x ^ ) dx ' 

2 

where dx is the standard Lebesgue measure in the space W 1 , interpreted as the 
space of n x n matrices X = (xij). 
Let 

f 

A = A nXn = < X = (x^) : x^ > for all i,j and x^ = 1 

[ i,j=i 

be the standard (n 2 — l)-dimensional (open) simplex in IR n endowed with the 
probability measure \i that is the normalization of the Lebesgue measure on A nXn . 

Since per [X <g> J t ) is a homogeneous polynomial of degree N = nt in the entries 
x^ of X, we obtain the following integral representation, see Section 4 of [Ba05]. 



(2.2) Corollary. 



Thus we define / : A — > R + in representation (1.3.1) by 
(2-3) f(X)= \ n2 _ m ^ 2n per(X®J t ) for X e A nXn . 

(2.4) Factoring the density. Now we describe how to factor the density f = p<p 
in (1.3.2), where cj> : A — > R + is a log-concave function and p : A — > R + is 
a function which "does not vary much" on A. We employ the notion of matrix 
scaling, see [Si64], [M068], [KK96], [L+ 00]. 

Let X = (xij) be an n x n positive matrix. Then there exists an n x n positive 
matrix Y = (yij) and positive numbers Aj, fa for i = 1, . . . , n such that 

Xij = VijXiHj for i,j = l,...,n 

and Y is doubly stochastic, that is, 

n 

yij = 1 for i = 1, . . . , n and 

i=i 

n 

^^• = 1 for j = l,...,n. 

i=i 

Furthermore, given matrix X, the matrix Y is unique (we call it the doubly 
stochastic scaling of X) while the factors X{ and fij are unique up to a rescaling 
Xi := AjT, /Lij := far -1 for some r > and i = 1, . . . , n. This allows us to define a 
function a : A — > R_|_ by 

n 

apT) = J] (A^) • 

i=l 

Clearly, perX = (pery)cr(X). 

The crucial fact about a that we use is that a is log-concave, that is, 

a(a 1 X 1 +a 2 X 2 ) > a ai (X^a^ (X 2 ) 

for all Xi,X 2 G A and all ai,a 2 > such that a± + a 2 = 1, see [GS02], [Gu06], 
[Ba05], [Ba06]. Also, for any given X, the value of a can be computed within 
relative error e in time polynomial in ln(l/e) and n [L+00]. 

One can easily see that if Y is the doubly stochastic scaling of X then Y <g> J t is 
the doubly stochastic scaling of X ® J t and that 



a{X® J t ) = a\X). 
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Moreover, 
We define 



per (X <g> J t ) = per (Y <g> J t ) a\X). 



f = pep where 
(2.4.1) = -jfi 



N N 

p(X) = — — per (y <g> J t ) and 



(iV + n 2 -l)!AH^ 

= {n 2 _ 1)m 2n N N * 

for iV = nt. Clearly, </> is log-concave and Corollary 2.2 implies that 
(2.4.2) |E(n, t)| = / pcj) dfi. 

J A 

We note that for any given X e A and e > the value of p(^) can be computed 
within relative error e in time polynomial in N and e _1 by the randomized algorithm 
of [J+04]. 

It is convenient to define / and <j) on the set Mat + of positive nxn matrices and 
not just on the simplex A. 

3. Bounds and the detailed description of the algorithm 

In this section we give the detailed description of the algorithm and also sum- 
marize various bounds that we need. We begin with some general bounds on the 
number |E(n, t)\ of magic squares and the density factors p and 4> in (1.3.2) and 
(2.4.1). 

(3.1) Theorem. We have 
(1) 

N + n 2 -l\ (N + n-l\~ 2 [N + n 2 -1 



(2) 



(3) 



n l — 1 / V n — 1 / \ n z — 1 



( t \)n N N 

l<p(X)<^__ /oraH XGA; 



< 0(X) < Z^ 4 "^ M a// IeA. 

rv ; ~ (n 2 - l)\(t\) 2n n 2N 



We will use somewhat cruder estimates which are easier to work with. 
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(3.2) Corollary. We have 
(1) 



(2) 
(3) 



N + n 2 -1\ +\\^ fK r, ,-2n[N + n 2 -l 

2 > > iV + n 

n z — 1 J \ n z — 1 



1 < ppO < (St) n/2 for all X e A; 



0</(X)<( ^ 2 _ i ) for all X e A. 



We prove Theorem 3.1 in Section 5. Corollary 3.2 follows by standard estimates 
via Stirling's formula 

(3.2.1) v / 2^(^)% 1/(12x+1) <r(x + l)<v / 2^(^)% 1/(12x) for x > 0. 

It turns out that most of the time we have p(X) = N 0< - lnN K Now we state the 
key estimate of the paper. 

(3.3) Theorem. For any a > there exists (3 = (3(a) > such that for all positive 
integers n and t such that 

t<e n , 

we have 

fi[x G A nxn : p{X) > N 13 ln ^ | < N~ an . 

We prove Theorem 3.3 in Section 8 having established an estimate for the en- 
tries of the doubly stochastic scaling of a matrix in Section 6 and some standard 
probability bounds in Section 7. 

Finally, we need some technical estimates showing that <fi is sufficiently regular so 
that we can indeed apply integration and sampling algorithms of [AK91], [F+94], 
[FK99], and [LV06]. 

For 

< 5 < -4, 



let us define the 8 -interior of the simplex by 



A s = A s nXn = < X = (xij) : Xij > 6 for all i,j and ^ = 1 > . 
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(3.4) Theorem. We have 
(1) 

/ fdn< [ f d» < (1 - 6n 2 y N ~ n2+1 I fdfi, 
Ja s J a Ja s 

(2) For any 1,7 6 A 5 where X = (xij) and Y = (y.ij) we have 

N 

| In 0(A) - In 0(F) | < — max \xij - y,ij\. 

o ij 

Theorem 3.4 is proven in Section 4 of [Ba05]. For completeness, we present its 
straightforward proof in Section 9. 

(3.5) The algorithm. Now we can describe the algorithm in more detail. First, 
we assume that t < e n . Indeed, for t > n 3 there is a randomized polynomial 
time approximation scheme for computing |E(n, t)\ which is a particular case of the 
algorithm of Dyer, Kannan, and Mount [D+97], see also [Mo02] for a strengthening. 
Second, we assume that e > N~ n , which is not really restrictive since t)\ can 
be computed exactly in time by a dynamic programming algorithm. 

For an < e < 1, let us choose 

<5 = TTTT^- 9 ~ i,nr 6 o TT for small e > 0. 

n 2 (A^ + n 2 -l) n 2 (N + n 2 - 1) 

By Part 1 of Theorem 3.4, the integral 

(3.5.1) f fdn 

Jas 

approximates |E(n, t)\ from below within the relative error e. We factor f = p<p 
as in Section 2.4. By Parts 1 and 2 of Corollary 3.2 and Theorem 3.3 (with a 
sufficiently large a) it follows that for p defined by 



p(X) 

with T = N@ lnN , the integral 



(p(X) if p(X)<T 
\ T if p{X) > T 



(3.5.2) / pcj) dn 

Ja s 

approximates (3.5.1) from below within relative error N~ n < e. 

A simplified, polynomial time algorithm, replaces integral (3.5.2) by the integral 



(3.5.3) 



f (f) d/j,, 

A s 
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which approximates (3.5.2) within a factor of N f31nN . Because <j> is log-concave and 
satisfies the bound of Part 2 of Theorem 3.4, the integral (3.5.3) can be computed 
within relative error e by any of the algorithms of [AK91], [F+94], [FK99], and 
[LV06] in time polynomial in N and e _1 . We use an algorithm of [L+00] to compute 
4>{x) for a given x G A. 

In the more accurate, but also more time consuming, version of the algorithm, 
we write (3.5.2) as 

(/a/*)(LH- 

where v is the probability measure on A s with the density proportional to 4>. Fur- 
thermore, the algorithms of [AK91], [F+94], [FK99], and [LV06] allow us to sample 
independent random points from a probability measure v sufficiently close to z/, 
that is satisfying 

| -v{S)\ < eN- l3lnN foranyBorel 5 C A 5 . 

A single point can be sampled in (l/e)°^ N°( lnN ^ time. We sample m = |~3T 2 e~ 2 ] 
independent random points Xi with respect to measure v and estimate the integral 

(3.5.4) / pdu 

Ja s 

by the sample mean 

(3.5.5) m-^P^i) 

i=i 

By Chebyshev inequality, (3.5.5) approximates (3.5.4) within relative error e with 
probability at least 2/3. We use the algorithm [J+04] to compute p(xi). 

(3.6) Details of the implementation. We implemented a much simplified ver- 
sion of the algorithm, computing the integral (1.3.4) 




In our implementation, we work with the original simplex A, not its 5-interior 
A s . This has never given us any boundary-related trouble in our computational 
experiments. 

The implementation is based on a version of the hit-and-run algorithm of [LV06] , 
see also [Ve05] . We use telescoping with respect to the density <f>. Namely, we pick 
a sufficiently dense uniform subset 



< ti < t 2 < • • • < t m = t, 
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and define a log-concave function ipi by 



ipi{X)=a u {X) for IeA, 

cf. Section 2.4 and formula (2.4.1), in particular. The number m of points is chosen 
by the user so as to be "reasonable" . 

For a given X e A we compute cr(X) by the Sinkhorn balancing (alternate 
scaling of rows and columns of X to the unit sums) [Si64], which seems to work 
very well even for n as large as n = 100. 

Note that the numbers £j are not necessarily integer and that 



(N + n 2 - l)\N\t N 
(n 2 -l)!(t!) 2n iV^ 



cf. (2.4.1). Hence the goal is to compute 

« m— 1 g 

/ ip m d/i = Sx 17 -^1 where 

J A ~T 



i=l 

S'i = / ipi dfi for z = 1, . . . , m. 
J A 

If £i is sufficiently small, the function ipi is close to a constant, so we estimate Si 
by a sample mean of ipi for a set of randomly chosen X e A. In our experiments, 
we often chose £i = 1. To choose a random le A from \i, we choose the entries Xij 
of X independently from the standard exponential distribution and then normalize: 




Xij:=Xij\ x ki | for i,j = l, . . . , n. 
To compute ratios 



J A V'i 

where i/j is the probability measure on A with the density proportional to ipi, we 
sample points X e A from z/j and average the ratios V'i+i(^)/' ; / , i(^)- If U+i is 
sufficiently close to ti, the ratios are close to 1, so a moderate number of sampled 
points is needed. Again, the number of samples is selected by the user. Thus the 
bottleneck of this simplified algorithm consists in sampling random points X e 
A from the probability measure z/j. For that, we iterate the basic "hit-and-run" 
construction. We sample a random point Xq e A from the uniform distribution \i, 
pick a random line I through X in the affine hull of A, sample X\ from the density 
on the interval I n A proportional to the restriction of ipi onto that interval and 
iterate the process with X := X\ as the new starting point. After a number of 
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iterations chosen by the user, the point X\ is accepted as a random sample from 
the measure i^. 

To choose a random line £ through X we first sample annxn matrix L = (Ay) 
of independent standard Gaussian random variables and then center it 

1 n 

Xij ■■= Xij 2 E Au for *' 3 = 1 > • • • ' n - 

U k,l=l 

We then define the line £ through X by 

£={x + tL: ret). 

To choose_apoint X\ G £ fl A, we approximate the restriction of ipi onto £ fl A by a 
function such that ln^^ is a piece- wise linear approximation of the restriction 

of ln^j onto £ fl A. Sampling from the density proportional to ipij reduces then to 
sampling from the exponential distribution. 

4. Preliminaries: estimates on the permanent 

We will use the following bounds for the permanent. 

(4.1) The van der Waerden bound. Let B = (bij) be an iV x N doubly 
stochastic matrix, that is, 

N N 

= 1 for i = 1, . . . , N and ^ b^ = 1 for j = 1, . . . , N 

j=l i=l 

and 

bij>0 for i,j = 1,... ,N. 

Then 

per B > 



N N 



This is the famous van der Waerden bound proved by Falikman [Fa81] and Ego- 
rychev [Eg81], see also Chapter 12 of [LW01]. 

(4.2) The continuous version of the Bregman-Minc bound. Let B = (bij) 
be an iV x N matrix such that 

JV 

^2bij<l for i = l,...,N 

i=i 

and 

bij>0 i,j = l,...,N. 
15 



Furthermore, let 



Si = max bin > for i = 1, .... N. 

3=1,..., N 



Then 

This bound was obtained by Soules [So03]. 

If s i = 1/ri for integers Ti the bound transforms into 

which can be easily deduced from the Mine conjecture proved by Bregman [Br73], 
see also Chapter 11 of [LW01]. 

We will use the following corollary of estimates of Sections 4.1 - 4.2. 

(4.3) Corollary. Let B = (bij) be an N x N doubly stochastic matrix and let 

Si = max ha for i = 1, . . . , N. 

j=l,... ,N 13 



Suppose that 



Then 



N 



Si < 7 for some 7 > 1. 



i=i 



Nl 



< P erB<^) N T^{l + ^ 



<^(2nN)^e^^. 



Proof. The lower bound is the van der Waerden estimate, see Section 4.1. 
Let us define 

2(£) = £mr(i±^)+ln£ for < £ < 1. 

Then g is a concave function, cf. [So03], and by the inequality of Section 4.2, we 
have 

N 

In per B < ^g(sj). 
i=i 
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The function 

JV 

G{x) = ^2g(£,i) for x= (&,... ,£n) 
i=i 

is concave on the simplex defined by the equation £1 + . . . + £n = 7 and inequalities 
£j > for % = 1, . . . , N. It is also symmetric under permutations of £i, . . . , £n- 
Hence the maximum of G is attained at 

£i = ... = 6v = 7/JV, 

and so 



2. 

.AT 

Thus 



In per S < A/# 



P erB<(±) F^l +7 

and the rest follows by Stirling's formula (3.2.1). □ 

We will apply Corollary 4.3 for 7 = 0(ln AT), in which case the ratio of the upper 
and lower bounds is N°( lnN \ 



5. Proof of Theorem 3.1 

To prove the upper bound in Part 1, we note that |E(n, t)\ does not exceed the 
number ofnxn non-negative integer matrices with the sum of entries equal to N, 
which is exactly equal to (^^".j -1 )- 

For non-negative integer vectors R = (ri, . . . , r n ) and C = (ci, . . . , c n ) such that 
r\ + . . . + r n = c\ + . . . + c n = N, let |£(-R, C) | be the number of n x n non- negative 
integer matrices with the row sums r±, . . . , r n and the column sums ci, . . . , c n , cf. 
Section 1.2. As is discussed in [Ba07], see formula (4) of Section 2 there, we have 

\E(R,C)\<\E(n,t)\ for all # and C. 

Since the the total number of pairs (R, C) is ( JV +" 1 " 1 ) , the lower bound follows by 
summing up the previous inequality over all choices of (i?, C) . 
We recall that p is defined by 

N N 

= ^- per (y ® J t ), 

where Y is an n x n doubly stochastic scaling of X and Jt is the t x t matrix filled 
with 1/t, cf. Section 2.4. Hence Y <S> Jt is an A^ x N doubly stochastic matrix and 
the lower bound in Part 2 follows by the van der Waerden bound, see Section 4.1. 
Since the entries of Y (g> J t do not exceed 1/t, by Section 4.2, we have 

/ \ N N (t\) n 
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which completes the proof of Part 2. 
Clearly, 



= Kiiff ' >w > ° forai1 xeA - 

Since cr(X) is log-concave and invariant under permutations of rows and columns 
of X , the maximum of a on A is attained at the n x n matrix A with the entries 
1/n 2 that is the average of all the matrices obtained from any given X e A by 
permutations of rows and columns. It is immediate that a (A) = n~ n and hence 

j. f ^ / j., as (N + n 2 -l)\N\t N _ N 

and the proof of Part 3 follows. □ 
As we remarked before, Corollary 3.2 is obtained by a straightforward application 
of the Stirling formula and a remark that / = pcj). 

6. Bounding the entries of the 
doubly stochastic scaling of a matrix 

In this section we prove the following main result. 

(6.1) Theorem. Let A = (a^) be an n x n positive matrix and let B = (bij) be 
the doubly stochastic scaling of A so that for some Aj, [i 3 - > we have 

aij = bijXiHj for all i,j and 

n n 

b^ = 1 for i = 1, . . . , n and = 1 for j = 1, . . . ,n. 

j=l i=l 

Then, for all k and I, 

In b H < In a H In a kj V] In a« 

n — 2 n — 2 



n I 1 ^ \ 2n-2 w 



Example. Suppose that 1 < a^- < 2 for all i, j. Theorem 6.1 implies that for some 
absolute constant 7 we have 



bin < — for all z, 7. 
18 



Hence by Corollary 4.3, 

^<perB<n°^ 

and hence values of per B vary within up to a polynomial in n factor. In contrast, 

n\ < per A < 2 n n!, 

so values of per A vary within an exponential in n factor. 

This concentration of the permanent of the doubly stochastic scaling of a matrix 
is the basis of our approach. 

The proof of Theorem 6.1 is based on the following two lemmas. 
The first lemma was proved in [L+00], for completeness we include its proof 
here. 

(6.2) Lemma. Let A = (a^) be an n x n positive matrix such that 

n 

and let B — (bij) be the doubly stochastic scaling of A. Then 

n n 



Proof. As is known, B can be obtained from A as the limit of repeated alternate 
scalings of the rows of A to the row sums equal to 1 and of the columns of A to the 
column sums equal to 1 (Sinkhorn balancing), see [Si64]. Hence it suffices to prove 
that under the row (column) scalings, the sum of the logarithms of the entries of 
the matrix can only increase. 

To this end, let C = (c^ ) be a positive nxn matrix with the row sums pi, ■ ■ ■ , p n 
such that p\ + . . . + p n = n and let D — (dij) be the matrix such that 

dij = — for all 

Pi 

In words: we divide the ith row of C by its row sum p^. We note that the sum of 
the entries of D is n and that 

n n 

(Indij — lncy) = — n^^lnpj > 

i,j=l i=l 

because of the arithmetic-geometric mean inequality. 

Column scalings are handled in the same way. □ 

The following result is obtained in [Br73]. For completeness, we provide its proof 
below. 
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(6.3) Lemma. Let A = (a^) be a positive n x n matrix and let B = (pij) be its 
doubly stochastic scaling. Then B is the solution of the optimization problem 

n 

minimize (In 2;^ — lna^) 

i,J = l 

over the set of all n x n doubly stochastic matrices X = (xij). 

Proof. First, we note the minimum is attained on a positive doubly stochastic 
matrix X. If, for example, x\\ = 0, then there are indices i and j such that 
xn > 0, x\j > 0, x^ < 1 and one can make the value of the objective function 
smaller by modifying 



X W • — €. X ij . — X ij £ ^ X 2 \ • — X 2 \ €■ j X t j . — X 2 j ~\ €. 

for a sufficiently small e > 0. This follows since the right derivative of x Inx is —00 
at x = and is finite at any x > 0. 

Since the optimal point X lies in the relative interior of the set of doubly sto- 
chastic matrices, the gradient of the objective function at X should be orthogonal 
to the space ofnxn matrices with the row and column sums equal to 0. 

This gives us the following equations 

In x^ — lnoy = £j + r\j for all i,j 

and some numbers £1, . . . , £ n ; 771, . . . , n n . 
In other words, 



x : 



Q-ijXiHj for \ = e^ and \ij = e r>i for all i,j 



'13 

as desired. □ 

Now we are ready to prove Theorem 6.1. 

Proof of Theorem 6.1. First, we notice that neither the matrix B nor the right 
hand side of the inequality change if we scale 

aij := OijT for all i,j 

and some r > 0. Therefore, without loss of generality, we assume that 

n 

Without loss of generality, we assume that k = I = 1, so our goal is to bound 
611. 
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By Lemma 6.3, matrix B is the solution of the minimization problem 

n 

minimize Xij (lna^j — lna^) 

over the set ofnxn doubly stochastic matrices X = (xij). 
For a real r, let us define the matrix B(r) = (^j(r)) by 

' bn+r if i = j = 1, 

by - r/(n - 1) ifz = l,j^l, 

6ii-r/(n-l) if z^l,j = l, 

I 6 lJ +r/(n-l) 2 if i^lj^l- 

We observe that B(0) = B, that the row and column sums of B(t) are 1 and 
that B(t) is positive for all r from a sufficiently small neighborhood of the origin. 
Therefore, if we let 

n 

f( T )= Yl M r ) ( ln M r ) - lna *j) > 

we must have 

/'(0) = 0. 

Computing the derivative, we get 

/ / (0)=ln6 11 -lna 11 + l 

— 7 ( lnbl i ~ lnai ^ ~ 1 

n — 1 z — 

— ( ln ^l ~~ lna il) - 1 

+ (n-1) 2 ( ln6 ^' _ lna ^ + L 



Rearranging summands, we rewrite the derivative in the form 
/'(0) = (l - (^1)2) (Infcn-lnan) 

-(^+(^Tp)E( ln ^- ln ^) 

-(^i + (^iF)^ (ln ^- lnail) 
i n 

+ (n-l) 2 ^2 (ln6lJ ~ lUa ^ ' 

i,.7=l 



,J-- 
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Since /'(O) = and since by Lemma 6.2 we have 

n n 

we must have 

n 2 — 2n 



(n-iy 



(In 6n - lnon) 



and 



_JT2^( ln6 iJ - lna ij) " J] (In 6 a -lnaa) 

< 0. 

That is, 

(n — 2) (ln&n — lnan) — (ln&ij — lnai-,) — (In 6a — lnaa) < 0. 

j¥i Mi 

In other words, 

ln6 " - ^ E ln ^- - ^ E ln6 ^ 

- lnan " ^r^J2 lnai i -^^J2 lna ^ 

lnfrn < lnan y^ lna ij — ^ y^ lna ii 

On the other hand, if the value of &n is fixed, the maximum value of 

y^lnbij + ^ lnfrg 

is attained at 

fri? = fra = — for all 

3 n-1 

(since the row and column sums of B are equal to 1). 
Therefore, we have 

ln frn < ln an ^_ V] ln ay V] In a a 

n — 2 ^ n — 2 '— J 

j/i ¥i 

2n - 2 2n - 2 

+ _ ln(1 _ 6ll) __ M „_ 1) . 

Since ln(l — frn) < and XH\'=i a *i = n ' this completes the proof. □ 
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7. Probabilistic estimates 
The goal of this section is to prove the following technical estimates. 
(7.1) Lemma. 

(1) Let di, . . . ,a n be independent standard exponential random variables. Then 
for every r > we have 



r {i |>»<<->j<( J)" /2 ; 

(2) Let ai, . . . , a m be independent standard exponential random variables. Then 
for every r > we have 



\ 1 m 

P 



4 \ m / 2 



> r> < \ — 

m ^-^ 

i=i 

(3) Let aij, 1 < i,j < n, be independent standard exponential random variables 
and let 

Ci = max aij for i = 1, . . . , n. 

j=l,... ,n 

Then for every r > we have 



p Ui>> r W r " /2 (»^)' 



Proof. We use the Laplace transform method, see, for example, Appendix A of 
[ASOO]. 

To prove Part 1, let 



1 

b = — } Inai. 
n i 

For r > we get 

P {b < - r } = P {e~ Tb >e Tr }< e^Ee" 16 
by the Markov inequality. Let us choose r = n/2. Then 

= Ej]a- 1/2 =(^y x-^ 2 e- x dxj = T n (l/2) = W 2 



Ee 

i=i 



Hence 

f i n 

a . <- r \ < n n / 2 e- rn / 2 . 
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p/iVln, 



To prove Part 2, let 



b = — Y] a,i 

1=1 



For t > we get 

P{6> r} = P{e Tb > e Tr ) < e" Tr Ee 7 
Let us choose r = m/2. Then 

m , ^+00 \ m 



i=i 

To prove Part 3, let 



E e Tb = Y[ E e a * /2 = (J e" x/2 dxj = 2 m . 



n 

' Ci 



1 n 

n ^ 

i=i 

For r > we get 

P{b> r} <e~ Tr Ee Tb . 

We choose r = m/2. Then 

n 

Ee Tb = JjEe c 2 . 

We have 



E (W 2 ) =E ^ max e a ^ /2 ^ < E [yI^ 1 
r+00 

=n / e _a;/2 dx = riT (1/2) = 
Jo 

and the proof follows. □ 

8. Proof of Theorem 3.3 

Let Mat+ = Mat_|_(n, n) be the set of n x n positive matrices A = (a^j) and let 
us consider the projection \1/ : Mat + (n, n) — > A nXn , where ^(A) = X = (xij) is 
defined by 



-1 



n 



^ a k i for i,j = 1,. . . ,n. 

As is known and easy to check, the push-forward of the exponential measure v on 
Mat + with the density 

exp \ ~ ^2 a ^ 
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is the probability measure fi on A. In other words, if A is a random n x n matrix 
with independent standard exponential entries then X = ^(A) is a random matrix 
from the simplex A sampled in accordance with the uniform probability measure 
\i. Furthermore, the doubly stochastic scalings of A and ^(A) coincide. 

Let us choose A e Mat + , let X = ^(A), and let B be the doubly stochastic 
scaling of A. Then 

N N 

P(X) = -^j- Per (B <g) J t ) • 
In view of Corollary 4.3, the proof of Theorem 3.3 follows from the following result. 

(8.1) Proposition. For any a > there exists (3 = (3(a) > such that for all 
positive integers n and t such that 

t<e n 

the following holds. 

Let A = (aij) be the n x n random matrix with the independent standard ex- 
ponential entries and let B = B(A), B = (bij), be its doubly stochastic scaling. 
Then 



P max bi^j > /?lnivj < N~ 



Proof of Proposition 8.1. Let us introduce random variables 

1 n 

Ui = — > lna^, for i = 1, . . . , n and 
n ^ 
i=i 

1 n 

Vj = — In a^ for j = 1, . . . , n. 

i=l 

Applying Part 1 of Lemma 7.1, we conclude that for some absolute constant 
r > we have 

P{wi < ~r) < 2~ n and P{vj < -r} < 2~ n for all i,j = l,...,n. 
It follows then that one can choose a (3\ = f3\(a) > such that 

P\\i : m < -r\ > (3! lnivl < -N~ an and 
(8.1.1) 1 J 6 

P{|j : Vj < -r\ > /?ilniv| < -N~ an . 

Applying Part 2 of Lemma 7.1 with m = n 2 and using that t < e n we conclude 
that for some constant /?2 = /32(a) > we have 



(8.1.2) 




Let us define 



Cj = max a,ij. 

3 = 1,... ,n 



By Part 3 of Lemma 7.1, for some absolute constant (3^ = /3 3 (a) > we have 

(8.1.3) P{i£c,>/? 3 lniv}< ± AT™. 

Let us define a set ^4 C Mat + of matrices by 

A = = (ay) : \i : Ui < -r\ < IniV, \j : Vj < -r\ < (5 X In N, 

1 n 

— - and 

1 n 1 

From (8.1.1)-(8.1.3) we conclude 

P{A G .4} > 1 — N~ an . 

Let us pick a matrix A e A and let £? be its doubly stochastic scaling. By 
Theorem 6.1, we have 



n , 1 1 

n — 2 n — 2 n - 2 J 



(8.1.4) 



ra . / 1 A \ 2n-2 w 



Let us define 

/ = jz : Wi < — r j and J = jj ; : < — r|, 

so |/|, | J| < /3i IniV. Thus from (8.1.2) and (8.1.4) we deduce that for some constant 
7 = 7(/?i, fo, fo) we have 

&y < < 2 cr /(n - 2) for itI,jtJ. 

Tli Tli 

To complete the proof, we use the estimates 

max bij < 1 for % e I and 

.7 = 1,... ,n 

max 6j 7 - < — c ™^ n 2 -* + for i ^ I. 

i=i,...,n n r-t 
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Summarizing, 



n 



i=l V ' ' / i=i j^/ j € j 

/ n \ «/(«-2) 



< 



n 

<l3\nN 



vi=l 



for some /3 = /?3, 7) as desired. □ 

9. Proof of Theorem 3.4 
We use that both / and cj) are positive homogeneous of degree AT, that is, 

/(AX) = A Ar /(x) and <j>(\X) = X N ' <j){X) 
for all lGMat + (n,n) and A>0 

and monotone, that is 

f(X)<f(Y) and 0(X)<0(y) 

for all X, Y e Mat+(ra, n), X = (xy) , E = (y^) 
such that Xij < y,ij for i, j = 1, . . . , n. 

Among these properties only the monotonicity of <p is not immediately obvious. It 
follows, for example, from the following representation of o~(X), see [MO86] and 
Section 2.4. For a positive matrix X = (x^) we have 

( n V 

n n a(X) = I min Xij&Vj I over au diVj > 

n n 

subject to jf[£i = [1^ = 1. 

i=i j=i 

Let dx be the Lebesgue measure on the hyperplanes Yl7j=i Xi i = cons ^ m the 
space of all n x n matrices X = (xij). Using that f(X) is homogeneous, we get 



/ f(x) dx = (l-Sn 2 ) N+n2 - 1 [ 

J(l-Sn 2 )A J A 



f(x) dx. 



On the other hand, for all X e (l — 5n 2 ) A the matrix E = (y^) defined by 
Vij = x ij + ^ nes m ^5 an d /(E) > f{X), which completes the proof of Part 1. 
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To prove Part 2, let 



a = max x. 



ij Uij | • 



Hence 



Xij <Vij + a<(l + a/ 5) and 

Vij < Xij + a < (1 + a/5) Xij for all z, j 



Using monotonicity and homogeneity of 4> we conclude that 



<t>(X) < (1 + a/5y <t>(Y) and 0(Y) < (1 + a/5) iV 0(X), 



from which the proof follows. 



□ 



10. Concluding remarks 



(10.1) Counting general contingency tables. It is plausible to attempt to 
devise similar algorithms for counting contingency tables with the given row and 
column sums R = (ri, . . . , r m ) and C = (ci, . . . , c n ), where 



cf. Section 1. While the general idea can be easily generalized to this case, cf. 
[Ba05], we were unable so far to prove all the necessary bounds, except in the 
special case when the row sums are equal 

n = . . . = r m = t 

or the column sums are equal 



but not necessarily both. 

Suppose, for example, that the row sums are equal. Modifying the construction 
slightly, one can represent the required number of tables by the integral 



where Q is the set of non-negative m x n matrices with all the row sums equal to 
1 (geometrically, Q is a product of m simplices of dimension (n — 1) each) and [i is 
the Lebesgue measure on Q normalized by the condition fi(Q) = 1. The function 
/ factors into the product / = p<p of a log-concave function and a slowly varying 
function p and all the necessary estimate can be carried through, resulting in a 
randomized polynomial time algorithm approximating the number of tables within 
a factor of N logN and a randomized quasi-polynomial algorithm of (l/e)°^N l ° s N 
complexity to approximate the number of tables within any given relative error 



ri + ■ ■ ■ + r m = ci + . . . + c n = N, 



C\ — ... — c n — t, 




Q 



e > 0. 
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(10.2) Improving the bound. The bottleneck of our algorithm is defined by the 
ratio 

c(n,t) = |£(n,t)|/ (J<f> d M > 

where <p is the log-concave density on the simplex A denned by (2.4.1). Roughly 
speaking, c(n, t) is the main contribution to the computational complexity. We 
proved that c(n, t) = jV°( lnAr ) and some conjectural inequalities for the permanent 
(Conjectures 1.1 and 1.6 of [Sa06]) imply that we can choose the threshold T = 
jyO(i) j n g ec tion 1.3 and therefore one should have c{n,i) = N°^ l \ 
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